This exercise accompanies the lessons in Environmental Data Analytics on Data Exploration.
<FirstLast>_A03_DataExploration.Rmd (replacing <FirstLast> with your first and last name).TIP: If your code extends past the page when knit, tidy your code by manually inserting line breaks.
TIP: If your code fails to knit, check that no install.packages() or View() commands exist in your code.
library(tidyverse)
## Warning: 程辑包'tidyverse'是用R版本4.1.3 来建造的
## -- Attaching packages --------------------------------------- tidyverse 1.3.2 --
## v ggplot2 3.3.5 v purrr 0.3.4
## v tibble 3.1.6 v dplyr 1.0.9
## v tidyr 1.1.4 v stringr 1.4.1
## v readr 2.1.1 v forcats 0.5.1
## Warning: 程辑包'dplyr'是用R版本4.1.3 来建造的
## Warning: 程辑包'stringr'是用R版本4.1.3 来建造的
## -- Conflicts ------------------------------------------ tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
library(lubridate)
##
## 载入程辑包:'lubridate'
##
## The following objects are masked from 'package:base':
##
## date, intersect, setdiff, union
Neonics <- read.csv('D:/ENV872_DataExploration/ENV872_DataExploration_Fall2023/Data/Raw/ECOTOX_Neonicotinoids_Insects_raw.csv', stringsAsFactors = TRUE)
Litter <- read.csv('D:/ENV872_DataExploration/ENV872_DataExploration_Fall2023/Data/Raw/NEON_NIWO_Litter_massdata_2018-08_raw.csv', stringsAsFactors = TRUE)
Neonics
Litter
Answer: the specificity of these neonicotinoids is very important. If these neonicorinoids have a high specificity, other insects (that have important ecological impacts) can still be alive. Human and other mammals will also be safe. The knowing the ecotoxicology helps us to understand the specificity and protect our food and water resources for the current and future generations.
Answer: Woody debris has important roles in carbon recycle and can provide habitat to terrestrial and aquatic creatures. Litter debris, however, can be a source of plastic pollution. These litter debris can affect soil quality, negatively influence human and other animals’ health. Therefore, studying litter and woody debris is important to forest ecology and health of different creatures.
Answer: 1. Litter and fine woody debris sampling is executed at terrestrial NEON sites that contain woody vegetation >2m tall. 2. Ground traps are sampled once per year. Target sampling frequency for elevated traps varies by vegetation present at the site, with frequent sampling (1x every 2weeks) in deciduous forest sites during senescence, and infrequent year-round sampling (1x every 1-2 months) at evergreen sites. 3. In sites with forested tower airsheds, the litter sampling is targeted to take place in 20 40m x 40m plots.
colnames(Neonics)
## [1] "CAS.Number" "Chemical.Name"
## [3] "Chemical.Grade" "Chemical.Analysis.Method"
## [5] "Chemical.Purity" "Species.Scientific.Name"
## [7] "Species.Common.Name" "Species.Group"
## [9] "Organism.Lifestage" "Organism.Age"
## [11] "Organism.Age.Units" "Exposure.Type"
## [13] "Media.Type" "Test.Location"
## [15] "Number.of.Doses" "Conc.1.Type..Author."
## [17] "Conc.1..Author." "Conc.1.Units..Author."
## [19] "Effect" "Effect.Measurement"
## [21] "Endpoint" "Response.Site"
## [23] "Observed.Duration..Days." "Observed.Duration.Units..Days."
## [25] "Author" "Reference.Number"
## [27] "Title" "Source"
## [29] "Publication.Year" "Summary.of.Additional.Parameters"
# column names refer to the dimensions of dataset
summary function on the “Effect” column, determine the most common effects that are studied. Why might these effects specifically be of interest?summary(Neonics$Effect)
## Accumulation Avoidance Behavior Biochemistry
## 12 102 360 11
## Cell(s) Development Enzyme(s) Feeding behavior
## 9 136 62 255
## Genetics Growth Histology Hormone(s)
## 82 38 5 1
## Immunological Intoxication Morphology Mortality
## 16 12 22 1493
## Physiology Population Reproduction
## 7 1803 197
Answer: Population seems to be of interest as 1803 studies are population studies and mortality is the second (1493). The reason why population study is the most popular catagory may be that population study is the basic study for ecotoxicity. Most of the studies need to be done in population level to confirm the toxicity level before further research.
summary function, determine the six most commonly studied species in the dataset (common name). What do these species have in common, and why might they be of interest over other insects? Feel free to do a brief internet search for more information if needed.[TIP: The sort() command can sort the output of the summary command…]summary(Neonics$Species.Common.Name)
## Honey Bee Parasitic Wasp
## 667 285
## Buff Tailed Bumblebee Carniolan Honey Bee
## 183 152
## Bumble Bee Italian Honeybee
## 140 113
## Japanese Beetle Asian Lady Beetle
## 94 76
## Euonymus Scale Wireworm
## 75 69
## European Dark Bee Minute Pirate Bug
## 66 62
## Asian Citrus Psyllid Parastic Wasp
## 60 58
## Colorado Potato Beetle Parasitoid Wasp
## 57 51
## Erythrina Gall Wasp Beetle Order
## 49 47
## Snout Beetle Family, Weevil Sevenspotted Lady Beetle
## 47 46
## True Bug Order Buff-tailed Bumblebee
## 45 39
## Aphid Family Cabbage Looper
## 38 38
## Sweetpotato Whitefly Braconid Wasp
## 37 33
## Cotton Aphid Predatory Mite
## 33 33
## Ladybird Beetle Family Parasitoid
## 30 30
## Scarab Beetle Spring Tiphia
## 29 29
## Thrip Order Ground Beetle Family
## 29 27
## Rove Beetle Family Tobacco Aphid
## 27 27
## Chalcid Wasp Convergent Lady Beetle
## 25 25
## Stingless Bee Spider/Mite Class
## 25 24
## Tobacco Flea Beetle Citrus Leafminer
## 24 23
## Ladybird Beetle Mason Bee
## 23 22
## Mosquito Argentine Ant
## 22 21
## Beetle Flatheaded Appletree Borer
## 21 20
## Horned Oak Gall Wasp Leaf Beetle Family
## 20 20
## Potato Leafhopper Tooth-necked Fungus Beetle
## 20 20
## Codling Moth Black-spotted Lady Beetle
## 19 18
## Calico Scale Fairyfly Parasitoid
## 18 18
## Lady Beetle Minute Parasitic Wasps
## 18 18
## Mirid Bug Mulberry Pyralid
## 18 18
## Silkworm Vedalia Beetle
## 18 18
## Araneoid Spider Order Bee Order
## 17 17
## Egg Parasitoid Insect Class
## 17 17
## Moth And Butterfly Order Oystershell Scale Parasitoid
## 17 17
## Hemlock Woolly Adelgid Lady Beetle Hemlock Wooly Adelgid
## 16 16
## Mite Onion Thrip
## 16 16
## Western Flower Thrips Corn Earworm
## 15 14
## Green Peach Aphid House Fly
## 14 14
## Ox Beetle Red Scale Parasite
## 14 14
## Spined Soldier Bug Armoured Scale Family
## 14 13
## Diamondback Moth Eulophid Wasp
## 13 13
## Monarch Butterfly Predatory Bug
## 13 13
## Yellow Fever Mosquito Braconid Parasitoid
## 13 12
## Common Thrip Eastern Subterranean Termite
## 12 12
## Jassid Mite Order
## 12 12
## Pea Aphid Pond Wolf Spider
## 12 12
## Spotless Ladybird Beetle Glasshouse Potato Wasp
## 11 10
## Lacewing Southern House Mosquito
## 10 10
## Two Spotted Lady Beetle Ant Family
## 10 9
## Apple Maggot (Other)
## 9 670
sort(summary(Neonics$Species.Common.Name), decreasing = TRUE) # sort the output of species common names in a descending manor
## (Other) Honey Bee
## 670 667
## Parasitic Wasp Buff Tailed Bumblebee
## 285 183
## Carniolan Honey Bee Bumble Bee
## 152 140
## Italian Honeybee Japanese Beetle
## 113 94
## Asian Lady Beetle Euonymus Scale
## 76 75
## Wireworm European Dark Bee
## 69 66
## Minute Pirate Bug Asian Citrus Psyllid
## 62 60
## Parastic Wasp Colorado Potato Beetle
## 58 57
## Parasitoid Wasp Erythrina Gall Wasp
## 51 49
## Beetle Order Snout Beetle Family, Weevil
## 47 47
## Sevenspotted Lady Beetle True Bug Order
## 46 45
## Buff-tailed Bumblebee Aphid Family
## 39 38
## Cabbage Looper Sweetpotato Whitefly
## 38 37
## Braconid Wasp Cotton Aphid
## 33 33
## Predatory Mite Ladybird Beetle Family
## 33 30
## Parasitoid Scarab Beetle
## 30 29
## Spring Tiphia Thrip Order
## 29 29
## Ground Beetle Family Rove Beetle Family
## 27 27
## Tobacco Aphid Chalcid Wasp
## 27 25
## Convergent Lady Beetle Stingless Bee
## 25 25
## Spider/Mite Class Tobacco Flea Beetle
## 24 24
## Citrus Leafminer Ladybird Beetle
## 23 23
## Mason Bee Mosquito
## 22 22
## Argentine Ant Beetle
## 21 21
## Flatheaded Appletree Borer Horned Oak Gall Wasp
## 20 20
## Leaf Beetle Family Potato Leafhopper
## 20 20
## Tooth-necked Fungus Beetle Codling Moth
## 20 19
## Black-spotted Lady Beetle Calico Scale
## 18 18
## Fairyfly Parasitoid Lady Beetle
## 18 18
## Minute Parasitic Wasps Mirid Bug
## 18 18
## Mulberry Pyralid Silkworm
## 18 18
## Vedalia Beetle Araneoid Spider Order
## 18 17
## Bee Order Egg Parasitoid
## 17 17
## Insect Class Moth And Butterfly Order
## 17 17
## Oystershell Scale Parasitoid Hemlock Woolly Adelgid Lady Beetle
## 17 16
## Hemlock Wooly Adelgid Mite
## 16 16
## Onion Thrip Western Flower Thrips
## 16 15
## Corn Earworm Green Peach Aphid
## 14 14
## House Fly Ox Beetle
## 14 14
## Red Scale Parasite Spined Soldier Bug
## 14 14
## Armoured Scale Family Diamondback Moth
## 13 13
## Eulophid Wasp Monarch Butterfly
## 13 13
## Predatory Bug Yellow Fever Mosquito
## 13 13
## Braconid Parasitoid Common Thrip
## 12 12
## Eastern Subterranean Termite Jassid
## 12 12
## Mite Order Pea Aphid
## 12 12
## Pond Wolf Spider Spotless Ladybird Beetle
## 12 11
## Glasshouse Potato Wasp Lacewing
## 10 10
## Southern House Mosquito Two Spotted Lady Beetle
## 10 10
## Ant Family Apple Maggot
## 9 9
Answer: Honey bees are the most popular research subject because honey bees are ecologically important making them easy to transport pollutants. They are resilient to environmental stress. References: Cunningham MM, Tran L, McKee CG, et al. Honey bees as biomonitors of environmental contaminants, pathogens, and climate change. Ecol Ind. 2022;134:108457. https://www.sciencedirect.com/science/article/pii/S1470160X21011225. doi: 10.1016/j.ecolind.2021.108457.
Conc.1..Author. column in the dataset, and why is it not numeric?class(Neonics$Conc.1..Author.)
## [1] "factor"
view(Neonics)
Answer: The class of
Conc.1..Author.column is factor because some of the values are ‘NR’ and some values contains ‘/’.
geom_freqpoly, generate a plot of the number of studies conducted by publication year.plot1_NeoFreq <- ggplot(Neonics) +
geom_freqpoly(aes(x = Publication.Year), bins = 15) + theme_light() + labs(x = 'Publication Year', y = 'Number of publication')
plot1_NeoFreq
plot2_NeoFreqColor <- ggplot(Neonics) +
geom_freqpoly(aes(x = Publication.Year, color = Test.Location), bins = 15) + theme_light() + labs(x = 'Publication Year', y = 'Number of publication')
plot2_NeoFreqColor
Interpret this graph. What are the most common test locations, and do they differ over time?
Answer: The most common test locations are lab and field natural. Lab experiements have a trend of increasing from 1980 to around 2014 and decreasing afterwards. Similarly, the number of natual field experiments increases and decreases. The peak of the publications of natural field experiement is around 2009.
[TIP: Add theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1)) to the end of your plot command to rotate and align the X-axis labels…]
plot3_NeoBar <- ggplot(Neonics) +
geom_bar(aes(x = Endpoint)) + theme_light() + theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1)) + labs(y = 'Counts')
plot3_NeoBar
Answer: NOEL and LOEL are the two most common end points. NOEL (No-observable-effect-level) and LOEL (Lowest-observable-effect-level) are common Endpoints for terrestrial database.
unique function, determine which dates litter was sampled in August 2018.class(Litter$collectDate) # the class of collectDate is factor
## [1] "factor"
Litter$collectDate <- as.Date(Litter$collectDate, Format = "%Y-%m-%d") # change factor into date
class(Litter$collectDate) # check the class again
## [1] "Date"
unique(Litter$collectDate) # to see which dates litter was sampled
## [1] "2018-08-02" "2018-08-30"
unique function, determine how many plots were sampled at Niwot Ridge. How is the information obtained from unique different from that obtained from summary?unique(Litter$fieldSampleID) # to see how many unique values are in the column fieldSampleID
## [1] NEON.LTR.NIWO061169.20180802 NEON.LTR.NIWO064103.20180802
## [3] NEON.LTR.NIWO067017.20180802 NEON.LTR.NIWO040205.20180802
## [5] NEON.LTR.NIWO041059.20180802 NEON.LTR.NIWO063062.20180802
## [7] NEON.LTR.NIWO047197.20180802 NEON.LTR.NIWO051045.20180802
## [9] NEON.LTR.NIWO058101.20180802 NEON.LTR.NIWO046155.20180802
## [11] NEON.LTR.NIWO062050.20180802 NEON.LTR.NIWO040205.20180830
## [13] NEON.LTR.NIWO041059.20180830 NEON.LTR.NIWO047197.20180830
## [15] NEON.LTR.NIWO051045.20180830 NEON.LTR.NIWO058101.20180830
## [17] NEON.LTR.NIWO063062.20180830 NEON.LTR.NIWO046155.20180830
## [19] NEON.LTR.NIWO062050.20180830 NEON.LTR.NIWO061169.20180830
## [21] NEON.LTR.NIWO064103.20180830 NEON.LTR.NIWO057081.20180830
## [23] NEON.LTR.NIWO067017.20180830
## 23 Levels: NEON.LTR.NIWO040205.20180802 ... NEON.LTR.NIWO067017.20180830
summary(Litter$fieldSampleID)
## NEON.LTR.NIWO040205.20180802 NEON.LTR.NIWO040205.20180830
## 10 10
## NEON.LTR.NIWO041059.20180802 NEON.LTR.NIWO041059.20180830
## 8 11
## NEON.LTR.NIWO046155.20180802 NEON.LTR.NIWO046155.20180830
## 10 8
## NEON.LTR.NIWO047197.20180802 NEON.LTR.NIWO047197.20180830
## 8 7
## NEON.LTR.NIWO051045.20180802 NEON.LTR.NIWO051045.20180830
## 7 7
## NEON.LTR.NIWO057081.20180830 NEON.LTR.NIWO058101.20180802
## 8 9
## NEON.LTR.NIWO058101.20180830 NEON.LTR.NIWO061169.20180802
## 7 9
## NEON.LTR.NIWO061169.20180830 NEON.LTR.NIWO062050.20180802
## 8 7
## NEON.LTR.NIWO062050.20180830 NEON.LTR.NIWO063062.20180802
## 7 7
## NEON.LTR.NIWO063062.20180830 NEON.LTR.NIWO064103.20180802
## 7 8
## NEON.LTR.NIWO064103.20180830 NEON.LTR.NIWO067017.20180802
## 8 8
## NEON.LTR.NIWO067017.20180830
## 9
Answer: Total of 23 plots were sampled at Niwot Ridge. Summary() returns the sample sites but also how many results are in the same site while unique() only returns the name of each unique sample site.
plot4_litterBar <- ggplot(Litter) +
geom_bar(aes(x = functionalGroup)) + theme_light() + theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1)) + labs(y = 'Counts', x = 'Functional Group')
plot4_litterBar
geom_boxplot and geom_violin, create a boxplot and a violin plot of dryMass by functionalGroup.# box plot
plot5_litterBox <- ggplot(Litter) +
geom_boxplot(aes(x = functionalGroup, y = dryMass)) + theme_light() + theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1)) + labs(y = 'Dry Mass', x = 'Functional Group')
plot5_litterBox
# violin plot
plot6_litterVio <- ggplot(Litter) +
geom_violin(aes(x = functionalGroup, y = dryMass)) + theme_light() + theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1)) + labs(y = 'Dry Mass', x = 'Functional Group')
plot6_litterVio
Why is the boxplot a more effective visualization option than the violin plot in this case?
Answer: Violin plots may not work well when the sample size is small or there are multiple peaks rather than having a unimodal distribution. Therefore, in our case, the boxplot can be a more effection option which can also clearly show outliers.
What type(s) of litter tend to have the highest biomass at these sites?
Answer: Needles